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An ANOVA-Like Rasch Analysis of Differential Item Functioning 



Abstract 

The conventional two-group DIF analysis is extended to an ANOVA-like DIF 
analysis where multiple factors with multiple groups are compared simultaneously. 
Moreover, DIF is treated as a parameter to be estimated rather than simply a sign to be 
detected. This proposed approach allows us to investigate the effects of DIF on items 
more thoroughly. Results of simulation studies show that the parameters of the 
proposed models were recovered very well. A real data set with ten dichotomous items 
was analyzed. Implications and applications are addressed. 



Keywords: differential item functioning, Rasch model, ANOVA, factorial design, 



marginal maximum likelihood estimation. 



Item response theory (IRT) has been widely used to detect item differential 
functioning (DIF). Lord (1980) has pointed out that item characteristic curves are 
ideally suited to defining DIF. Since item parameters as well as person parameters 
determine the curves, the detection of DIF could be made by comparing item parameters 
between some focal group and some reference group. More specifically, within the 
framework of the Rasch model (Rasch, 1960), we can estimate the item difficulties 
separately for each group and then test their differences as follows: 

A A 

£ — jV ~~ 

^Var(b iF ) + Var(b lH ) ’ 

where b„. and b m are maximum likelihood estimates of item fs difficulty for the focal 
group and the reference group, respectively; Var(b IF ) and Var(b m ) are their estimated 

error variances, respectively. Z, follows approximately the standard normal distribution. 

Thissen, Steinberg and Gerrard (1986) and Thissen, Steinberg and Wainer (1988) 
have adopted a marginal maximum likelihood estimation (MML) to investigate DIF. 
They used the usual likelihood ratio test to compare a full model, where different item 
difficulty parameters are used tor different groups, with a reduced model, where different 
groups yield the same item difficulty parameters. 

The above conventional approaches have two shortcomings. First, the differences 
of the item difficulty parameters between focal and reference groups are to be tested 
rather than parameterized within the models. Therefore, the influences of DIF are not 
well investigated. Second, these item difficulty parameters are group-dependent. 
None of them can be treated as “item difficulty”. 

These shortcomings can be overcome by reparameterization of these item difficulty 
parameters. For example, the two item difficulty parameters (one for the focal group 




3 

4 



and the other for the reference group) can be reparameterized as one “grand item 
difficulty” and one DIF parameter. The grand item difficulty is in fact a weighted 
average of the two item difficulty parameters. The DIF parameter is the deviance of the 
item difficulty of the reference group to the grand item difficulty. It depicts how DIF 
influences the item characteristic curves. In this paper, the reparameterization is 
addressed. 

Conventionally, DIF analysis focuses on two groups: one focal group and one 
reference group. This is analogous to the r-test of two means. As the r-test is extended 
to simple or factorial analysis pf variance (ANOVA), DBF analysis can be extended to 
multiple factors with multiple groups. In’ this paper, an ANOVA-like Rasch DBF 
analysis is proposed. The proposed factorial DBF analysis has two major advantages. 
First, as ANOVA is statistically more powerful than the r-test, the ANOVA-like DIF 
analysis is more powerful than the conventional DIF analysis. Second, as main effects 
and interaction effects can be partitioned and investigated in ANOVA, they can also be 
done in the ANOVA-like DIF analysis, which in turn makes DIF analysis more thorough. 

In the following, I give detailed description of the proposed modeling. Results of 
simulation studies for parameter recovery are shown. Finally, a real data set was 
analyzed to illustrate implications and applications of the proposed modeling. 

Reparameterization of Item Parameters 

Let there be one focal group and one reference group. In the terminology of 
ANOVA, this is a one-factor design. We can estimate the item difficulty parameters for 
each group. Within the Rasch model, it follows: 

log (p/q)\ = 0.-& n , (la) 
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log {p/ q)i=O n - S n , 



(lb) 



where p is the probability of a correct answer of person n to item i; q is that of an 
incorrect answer of person n to item /; 9 n is person n s ability; S n is item i's difficulty for 
the reference group (subscript 1); S i2 is item i's difficulty for the focal group (subscript 2). 

These item parameters can be reparameterized as 

4 = 4 + 0 *. ( 2 ) . 

subject to 

Equations (la) and (lb) become- 

log(p/q)i= 0„-(&+ or,), (3a) 

log (p / q) 2 = e n -{S+ or.,), (3b) 

respectively. In the case of two groups, a n = -a i2 . <5ican be viewed as the grand item 
difficulty ot item i. or., represents the effect of group j on item i's difficulty and is 
referred to as a DIF parameter. If a tj is significantly different from zero, the item 
expresses DIF. To test this hypothesis, on one hand, we can compare the ratio of a. over 
its estimated standard error to the standard normal distribution. On the other hand, we 
can adopt the likelihood ratio test to compare two nested models: a full model with DIF 
parameters and a reduced model without DIF parameters. 

Equation (2) is analogous to one-way ANOVA. It can be extended to factorial 
ANOVA. For example, let there be two factors: Factor A, indexedy = 1, . .., J (e.g., race) 
and Factor B, indexed k = 1 , ..., K (e.g., gender). More specifically, let there be four 
groups: White Male (j = 1, k= 1), Color Male (/ = 2, k= 1), White Female, (j= 1, k= 2) 
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and Color Female (j = 2, k = 2). We could estimate the item difficulty parameters for 



each group as follows: 




log ip t < 7)11 = O n - 4„ 


(4a) 


logip/q) 21 = 3,-4,. 


(4b) 


logip/q) 12 = 4-4 2 , 


(4c) 


/og (p / < 7)22 = 4 - 4:» 


( 4d ) 



where 5 iU is item /’ s difficulty parameter for White Male; S m is that for Color Male; S n2 is 
that for White Female; <5;,, is that for Color Female; the others are defined as above. 

Like the reparameterization of Equation (2), these item difficulty parameters can be 
reparameterized as 

S i/t =&+ a. + 4 + («/?).,, _ (5) 

subject to 

i 

I, A =o. 

and 

Z ; ( a A ; * = =o. 

Equations (4a) to (4d) become 



lo S ( P / </) 1 1 = 0 - (&+ a n + #, + ( a#,, , ), ( 6a ) 

log (p / q) 2i = 9 n -(S.+ a i2 + /?, + ( ( 6b ) 

log (p / q) | 2 = e n -{5+ a n + /?., + (a/3) jr ), ( 6c ) 

/o A' (/> / 9)22 = #,-(3+ or ;: + 4 + ( a{3) iV ), ( 6d ) 
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lively. Consequently, c5Jcan be viewed as the grand difficulty of item /, eras the 
of Factor A., fi ik as the effect of Factor B k , and (a/3) ijk as the interaction effect of 
• A. and Factor B k , on item i. In the case of two levels in each factor, 
a, = -a,,, 

A, 

( a An = -(ap), | 2 = = (a/?).,. 

We can test if these DIF parameters are significantly different from zero by using 
the standard normal distribution or the likelihood ratio test. We may find some 
yield all kinds of DIF effects, some items yield only the interaction effect, some 
yield Factor A’s main effect, and some items yielding Factor B’s main effect, 

; others. 

Consider the influences of DIF parameters. At the Educational Testing Service, 
are classified into thee categories on the basis of Mantel Haenszel delta difference 
O-DIF). Category A contains the items with negligible or nonsignificant MH D- 
Category B contains the items with slight to moderate values of MH D-DIF. 
iry C contains the items with moderate to large values of MH D-DIF. Basically, 
ibsolute value of MH D-DIF is less than 1.0. the item goes to Category A. If it is 
t 1.0 but less than 1.5. the item belongs to Category B. Finally, if it is 1.5 or more, 
m is classified as Category C. 

Except for hard or easy items, a difference of 1 .0 MH D-DIF is very roughly equal 
'.ifference of . 1 in probability of correct answer between groups. Likewise, a 
nee of 1.5 MH D-DIF is very roughly equal to a difference of .15 in probability of 
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correct answer between groups. In terms of the logit scale, 1.0 and 1.5 deltas 
•rrespond to roughly .43 and .64 logits, respectively. Applying these criteria, three 
egories could also be formed. If the difference of item difficulties of two groups (see 
uation (5)) is not significantly different from zero or it is less than .43, the item 
ongs to Category A. If it is between .43 and .64, the item belongs to Category B. 
sally, if it is .64 or above, the item belongs to Category C. 

To make the proposed models possible, a multidimensional random coefficients 
dtinomial logit model is used and addressed in the following. 

The Multidimensional Random Coefficients Multinomial Logit Model 

The multidimensional random coefficients multinomial logit model (MRCML, 
ams, Wilson, & Wang, 1997) is a multidimensional extension of the random 
fficients multinomial logit model (Adams and Wilson, 1996). The MRCML model 
two levels. At the second level, a population model / 0 (0; a) is formed, where 0 is 

ector of latent variables and a is a set of parameters that characterize the distribution 
9. In the case of multivariate normal distribution, a becomes a mean vector and a 
iance-covariance matrix. At the first level, a conditional item response model 
(*:^|0) is formed, where x is a vector of observation on items, ^ is a vector of 
rameters that describe those items, and 0 is a vector of latent variables. The 
iditional item response model describes the probability of observing a set of item 
sponses conditioned on the level of an individual on the set of latent variables. 
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The Conditional Item Response Model 

Suppose a set of D latent traits underlies the examinees' test performances and 
examinees positions are denoted 0= # D ). Let there be / items indexed i = 1, 

/, and K[ response categories in item i indexed k - 1 K A response in category k 

of item i is scored on dimension d (the scoring schema is know a priori). The scores 
across D dimensions can be collected into a column vector b,* = (b^ i,... , biko), then into 
a scoring sub-matrix for item i, B ( . = (b f b fjc .^), and then into a scoring matrix 
B = (B B ; ) for the whole test. 

Let £, = denote a vector of p free item parameters. Let a design vector 

a /> denote a linear combinations of ^ corresponding to response category k of item i. 

They are denoted by a design matrix A = (a', , , a ', 2 a',*,, a 21 , ... , a 2 *„ ... , a ' lkl j 

for the whole test. Let an indicator variable denote as 

fl if response of person n to item i is in category k, 

" ,k (0 otherwise. 

Under the MRCML model, the probability of a response in category k of item i for person 
n is expressed as 




i.a.b. e.|e„)= 



ex P0>« 0. +a» 5) 

,v 

Z ex p(b',„ q) 

U= l 



A marginal maximum likelihood estimation with EM algorithm (Bock & Aitkin, 
1981) is developed. The proposed models in this study are all derived from the 
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MRCML by manipulating the design matrices A and B, although they are actually 

unidimensional. For example, suppose two items are administrated to four groups: 

White Male, Color Male, White Female, and Color Female. These two items are 

rearranged to eight virtual group-items. The left panel of Table 1 shows memberships 

and item responses of four persons. The right panel of Table 1 shows the rearranged 

responses of the eight virtual items, where V. 1 indicates virtual item 1, V.2 indicates 

virtual item 2, and so on. V. 1 to V. 4 belong to the first original item, and V.5 to V.8 to 

the second. In addition, V. 1 and V. 5 go with White Male; V.2 and V.6 with Color 

Male; V.3 and V.7 with White Female; V.4 and V.8 with Color Female. The other cells 

are blank and treated as missing. 

© 

Table 1 



Original item responses and rearranged item responses 





Item 1 


Item 2 






Eig 


ht Virtual Items 










V.l 


V.2 


V.3 


V.4 V.5 V.6 


V.7 V.8 


White Male 


1 


0 


1 






0 




Color Male 


0 


1 




0 




1 




White Female 


0 


1 






0 




0 


Color Female 


1 


1 








1 


1 



Note: values are hypothetical scores. 



In DIF analysis, the means of the distributions for various groups are usually quite 
different. Since in MML estimation with the normal case only a grand mean is assumed, 
we have to parameterize the differences among means for the groups. For four groups, 
there could be three parameters: one for the main effect of Factor A (i.e., difference 
between the race groups), one for the main effect of Factor B (i.e., difference between the 
gender groups), and the other for the interaction effect of Factors A and B (i.e., difference 
between the race by gender groups). In addition to these three “mean-difference” 
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parameters, there is one grand item difficulty for Item 1 (the other grand item difficulty 
for Item 2 is constrained for model identification). Moreover, there are one Race DIF 
parameter (the other Race DIF parameter for Item 2 is constrained for model 
identification), one Gender DIF parameter (the other Gender DIF parameter for Item 2 is 
constrained for model identification), one Race by Gender DIF parameter (the other Race 
by Gender DIF parameter for Item 2 is constrained for model identification). Altogether 
seven item parameters are formed. 

Figure 1 shows the corresponding scoring matrix B and the design matrix A. In 
Figure 1, £ indicates the mean difference between the race groups (main effect of Factor 
A); <£, indicates the mean difference between the gender groups (main effect of Factor B); 
<53 indicates the mean difference among the race by gender groups (interaction effect of 
Factors A and B). The mean of each group can be found from 

4k 4jk' 

where u jt stands for the mean for group jk\ //stands for the grand mean: Moreover, 

given two factors with two levels on each. 

4r4 0 if / = 1: g t =-c v if y = 2, 

■ 5 t . = c,, it k — 1 : C t = -c,, if k = 2 

4,k~ 4y if 0 = 1 ,k = 1) or (/' = 2, k = 2); g. t = -c„ otherwise. 

For example. White Male (j = 1 and k = I) has a mean of u n = u+ c, + c, + ^ ; Color Male 
(y' = 2 and k = 1 ) has a mean of //,, = //- ^ + c - White Female (J = 1 and k = 2) has a 
mean of //,, = ,u+ q x - c, - c,; Color Female (j = 2 and k = 2) has a mean of //,, = //-£,- c, + 
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4 y In practice, we are not interested in testing the differences of the means among 
groups because they are usually expected to be different. 

Regarding other item parameters in Figure 1, £ is the grand difficulty of the first 
item and is the grand difficulty of the second item, because the mean of all the grand 
item difficulties is constrained to zero for model identification. is a Factor A DIF 
parameter, the main effect of Factor A on item difficulties. is a Factor B DIF 
parameter, the main effect of Factor B on item difficulties. £ 7 is a Factors A by B DIF 
parameter, the interaction effect of Factors A and B on item difficulties. Item i's 
difficulty for group jk can be found from 

4*= 3+ a ii +/3 lt + (a/3).. t , 

where 

$=£,if i= 1; 3= -£,if i = 2. 

a r = c y if (t = 1 ,j = 1) or (i = 2,j =2); a.= -c 5 , otherwise, 

/?*= if (t = 1 , k = 1 ) or (t = 2, k =2); /?. t = -c h , otherwise. 

( aP) ijt = Cy if (j — 1 , k = 1 ) or (j = 2, k = 2), when / = 1 ; if (/ = 2, k = 1 ) or (j = 1 , k 
= 2), when / = 2; {a/J) ijk = -^ 7 , otherwise 

In fact, the computation of both the mean and the item difficulty of each group is similar 
to that of sample means in a factorial ANOVA design. 
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Figure 1 . Scoring matrix B and design matrix A for two items administrated to four 
groups 



The model in Figure 1 is a full model because all possible item parameters are 
estimated. We can form some reduced models by discarding some of the DIF 
parameters, for example, a model without interaction DIF. In addition, we can form 
models where all items, a subset of items, or no items express some kinds of DIF. 
Through model comparison, we can test if these DIF parameters are statistically 
significant. Although Figure 1 is an example of dichotomous items, the MRCML can 
be directly applied to polytomous items. Interested readers are referred to Adams, 
Wilson, and Wang (1997), Wang, Wilson, and Adams (1997) for details of how the two 
matrices were manipulated to torm various models. The computer software ConQuest 
(Wu, Adams, & Wilson, 1997) could be used to estimate the parameters. 
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Simulation Studies 



The design of the simulation studies is based on the real data analyses in the 
following section. Two-way factorial design was adopted with two levels on each, 
which leads to four groups. The sample sizes of these four groups are 214, 294, 83, and 
182. There are ten dichotomous items in the test. Two conditions were conducted: 
one is a full model with all possible DIF parameters (see Figure 1) and the other is a 
reduced model with a few DIF parameters. Fifty replications were made under each 
condition. 

In the full model, altogether 41 parameters were estimated, including two person 
distribution parameters (one grand mean parameter and one variance parameter), three 
mean-difference parameters, nine grand item difficulty parameters, nine Factor A DIF 
parameters, nine Factor B DIF parameters, and nine Factors A by B DIF parameters. 
Table 2 shows the generating values, the bias values (mean of fifty replications minus 
generating value), the standard errors, and the Z statistics (bias values divided with 
standard errors). According to the Z statistics, no parameters are statistically biased at 
the .05 level. In addition, all the parameters are recovered very well with the bias values 
between -.064 and .056. 

Under the reduced model, altogether 17 parameters were estimated, including two 
person distribution parameters, and three mean-difference parameters, nine grand item 
difficulty parameters, and three Factor A DIF parameters (only three items show Factor 
A DIF). The results are summarized in Table 3. No parameters are statistically biased. 
All the parameters are recovered very well with the bias values between -.021 and .020. 
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Table 2 

Generating values, bias values, standard errors, and Z statistics 



of various parameters in the full model (Model 1) 



Parameter 


Generating 


Bias 


SE 


Z 


Mean-difference 


1 


.46 


.030 


.074 


.41 


2 


-.06 


-.007 


.079 


-.09 


J 


-.16 


-.017 


.093 


-.19 


Grand Item Difficulty 


1 


-3.84 


-.064 


.196 


*.J J 


2 


-3.42 


-.014 


.139 


-.10 


“■» 

J 


-.70 


.002 


.117 


.02 


4 


-.42 


-.015 


.107 


-.14 


5 


-1.03 


-.035 


.125 


-.28 


6 


1.06 


-.018 


.119 


-.15 


7 


.34 


.019 


.126 


.15 


8 


1.83 


.021 


.132 


.16 


9 


2.65 


.056 


.181 


.31 


Factor A DIF 


1 


-.16 


-.004 


.184 


-.02 


2 


.32 


-.019 


.129 


-.15 


J 


-.08 


.044 


.113 


.39 


4 


.18 


.024 


.106 


.23 


5 


-.28 


.022 


.113 


.20 


6 


.02 


.016 


.128 


.13 


7 


-.19 


.023 


.114 


.20 


8 


-.04 


-.009 


.107 


-.08 


. 9 


.04 


-.013 


.203 


-.06 


Factor B DIF 


1 


.03 


-.002 


.183 


-.01 


2 


-.21 


.014 


.144 


.10 


j 


.01 


-.010 


.106 


-.10 


4 


-.05 


.006 


.102 


.05 


5 


.15 


.004 


.106 


.03 


6 


-.01 


.004 


.123 


.03 


7 


.23 


-.018 


.113 


-.16 


8 


.09 


-.042 


.173 


-.24 


9 


.05 


.008 


.159 


.05 


Factors A by B DIF 


1 


.36 


.014 


.193 


.07 


2 


.10 


-.017 


.141 


-.12 


j 


.02 


-.023 


.095 


-.24 


4 


.08 


-.015 


.092 


-.16 


5 


.08 


.007 


.102 


.07 


6 


.15 


.005 


.102 


.05 


7 


.10 


-.024 


.132 


-.18 


8 


-.14 


.009 


.111 


.08 


9 


-.14 


-.011 


.217 


-.05 


Grand Mean 


-1.1 1 


-.016 


.078 


-.21 


Variance 


2.70 


-.028 


.244 


-.12 
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Table 3 

Generating values, bias values, standard errors, and Z statistics of various 
parameters in the reduced model (Model 4) 



Parameter 


Generating 


Bias 


SE 


Z 


Mean-difference 










1 


.49 


l 

o 

o 


.067 


-.02 


2 


-.06 


1 

o 


.077 


-.20 




-.23 


.000 


.081 


.00 


Grand Item Difficulty 










1 


-3.74 


.009 


.141 


.06 


2 


-3.48 


.008 


.119 


.07 


•*> 


-.69 


-.021 


.093 


-.22 


4 


-.44 


.018 


.081 


.22 


5 


-.98 


.002 


.093 


.02 


6 


1.05 


-.008 


.112 


-.07 


7 


.43 


-.006 


.102 


-.06 


8 


1.84 


.003 


.129 


.02 


9 


2.65 


.001 


.191 


.01 


Factor A DIF 










1 


.28 


-.010 


.084 


-.11 


2 


.19 


-.007 


.083 


-.08 


J 


-.24 


.020 


.090 


.22 


Grand Mean 


2.71 


-.017 


.195 


-.08 


Variance 


-1.11 


.005 


.082 


.07 



Real Data Analyses 

A personality test with ten dichotomous items from Wang (1997) was analyzed. 
Subjects are 773 secondary school teachers and college students, including 214 female 
teachers, 294 female students, 83 male teachers, 182 male students. There are two 
factors: status (teacher and student) and gender (male and female). We are interested in 
it the items show Status DIF, Gender DIF, or Status by Gender DIF. To investigate this, 
several models were formed. Model 1 is a full model with 41 parameters. It has a 
deviance G' of 6122.02. The estimated parameters are shown in Table 2 as the 
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generating values. To test if the items show Status by Gender DIF, all the nine 
corresponding parameters were constrained to zero. The resulting model, Model 2, has 
a deviance of 6135.88. These two models are not statistically significant based on the 
likelihood ratio test. Therefore, no items show Status by Gender DIF DIF. 

Further, to investigate Gender DIF, all the nine corresponding DIF parameters were 
constrained to zero. The resulting model, Model 3, is a nested model of Model 2 and 
has a deviance of 6147.96. Again, the likelihood ratio test is adopted to compare 
Models 2 and 3. It is found that Model 3 is preferred. Thus, no items show the Gender 
DIF. I further constrained all the nine Status DIF parameters to zero to test if the items 
show Status DIF. The model, Model 0, is a model without any DIF and has a deviance 
of 6168.88. Comparing Models 0 and 3, we find they are statistically significant. That 
is, at least one item shows Status DIF. 

According to the estimated standard errors of the parameters in Model 3, as shown 
in Table 4, Items 2, 4, and 5 might have significant Status DIF effects. To investigate 
this, only the three DIF parameters are estimated and the other six DIF parameters are 
constrained to zero. The resulting model. Model 4, has a deviance of 6151.43. This 
model is not statistically different from Model 3. Therefore, only the three items 
express the status DIF. The estimated parameters in Model 4 are listed as the generating 
values in Table 3. Figure 2 shows the likelihood ratio tests for these five models. 

The Status DIF parameters of items 2. 4. and 5 are .28, .19, and -.24, respectively. 
Since the teachers are indexed in front of the students, item 2 is .56 (= 2 X .28) more 

difficult for the teachers than for the students. Likewise, item 4 is .38 (= 2X.19) more 
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difficult for the teachers than for the students. Item 5 is .48 (= 2 X.24) easier for the 

teachers than for the students. Based on the classification stated above, Items 2 and 5 
belong to Category B (slight to moderate effect); Item 4 and the other items belong to 
Category A (negligible effect). With this information, test developers and test users can 
gain deeper understanding about items on various groups. 

Table 4. 

Parameter estimates and their standard errors in Model 3 



Parameter 


Estimate 


SE 


Mean-difference 


1 


.48 


.06 


2 


-.06 


.07 


j 


-.23 


.09 


Grand Item Difficulty 


1 


-3.75 


.19 


2 


-3.47 


.12 


J 


-.70 


.10 


4 


-.43 


.09 


5 


-.97 


.11 


6 


1.06 


.11 


7 


.42 


.12 


8 


1.84 


.13 


9 


2.65 


.15 


Factor A DIF 


1 


-.02 


.19 


2 


.32 


.11 


j 


-.07 


.11 


4 


.20 


.09 


5 


-.23 


.09 


6 


.07 


.1 1 


7 


-.14 


.10 


8 


-.07 


.13 


9 


.00 


.16 


10 


-.07 


.23 
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Model 1 

A G 2 = 13.86 
df= 9 
p = .13 



A. 



Model 2 



AG 2 = 12.08 
df= 9 
P --21 





V 


Model 3 

k. A 




• 



AG 2 =20.92 
df= 9 

p = .01 



A 



Model 0 



AG 2 =3.47 
df= 6 
p = .75 



AG 2 = 17.45 

df=3 

p<. 01 



Model 4 



Figure 2. Likelihood ratio tests for the five nested models 



Conclusion 



Conventional DIF analysis is usually based on comparison of two groups, which is 
analogous to the /-test of two means. As the /-test is extended to ANOVA for multiple 
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groups and multiple factors, the conventional DIF analysis is extended to the ANOVA- 
like DIF analysis in this study. Moreover, DIF is treated as a parameter to be 

I 

estimated rather than simply a sign to be detected. In doing so, more thorough 
understanding of DIF can be acquired. 

Results of the simulation studies show that all the parameters were recovered very 
well. A real data set with ten dichotomous items was analyzed. Various model were 
formed to test if the items show Status DIF, Gender DIF, or Status by Gender DIF. 
Neither Status by Gender DIF nor Gender DIF was found. However, three items show 
Status DIF. Although in this paper, a two-way factorial design with two levels on each 
factor was illustrated; this approach can be generalized to more than two ways with more 
than two levels on each. In addition, the approach is not limited to dichotomous items. 
It can be easily generalized to polytomous items. 
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